Circuit for Shor's algorithm using 2n+3 qubits 
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• Abstract 

^-H ' We try to minimize the number of qubits needed to factor an integer 

■ of n bits using Shor's algorithm on a quantum computer. We introduce 

a circuit which uses 2n + 3 qubits and 0(n 3 lg(n)) elementary quantum 

. gates in a depth of 0(n 3 ) to implement the factorization algorithm. 

1^ ' The circuit is computable in polynomial time on a classical computer 

. and is completely general as it does not rely on any property of the 

' number to be factored. 

m : 
o . 

1 Introduction 

Since Shor discovered a polynomial time algorithm for factorization on a 
i ' quantum computer [Tj , a lot of effort has been directed towards building a 

working quantum computer. Despite all these efforts, it is still extremely 
difficult to control even a few qubits. It is thus of great interest to study 
qh exactly how few qubits are needed to factor an re-bit number. 

Quantum factorization consists of classical preprocessing, a quantum 
• i— i . algorithm for order- finding and classical postprocessing 01 (fig. ^) . We 

will concentrate on the quantum part of factorization and consider classical 
parts as being free as long as they are computable in polynomial time. The 
only use of quantum computation in Shor's algorithm is to find the order of 
a modulo N, where ./V is an re-bit integer that we want to factor. The order 
r of a modulo ./V is the least positive integer such that a r = l(mod N). 

For completeness, we now give the full algorithm for factoring N as given 
in 0: 

1. If N is even, return the factor 2. 
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Figure 1: The order- finding circuit for quantum factorization. U a implements 
\x) — > |(ax)mod iV) and the measurements followed by classical postprocessing 
yields the order r of a modulo N with good probability. 



2. Classically determine if N = p q for p > 1 and q > 2 and if so return 
the factor p (this can be done in polynomial time). 

3. Choose a random number a such that 1 < a < N — 1. Using Eu- 
clid's algorithm, determine if gcd(a, N)> 1 and if so, return the factor 
gcd(a,JV). 

4. Use the order-finding quantum algorithm to find the order r of a mod- 
ulo N. 

5. If r is odd or r is even but a r l 2 = — l(mod N), then go to step (hi). 
Otherwise, compute gcd(a r / 2 — 1, N) and gcd(a r / 2 + 1, N). Test to see 
if one of these is a non-trivial factor of N, and return the factor if so. 

It can be shown that with probability at least one half, r will be even 
and a r / 2 ^ — l(mod N) The quantum part of the algorithm (step 4) is 

known to be computable in polynomial time on a quantum computer. Using 
classical techniques, it is straigthforward to build the order-finding circuit 
(fig- ^) using a polynomial number of elementary gates and a linear number 
of qubits . Because the depth of the circuit is related to its running time, it 
is desirable to minimize this depth, and much progress has been made in that 
direction [I]. We propose to take the problem from the other side: by how 
much can the number of qubits be reduced for factorization in polynomial 
time? Answering this question would give insights on the size of a quantum 
computer useful for factorization. We thus introduce a new order-finding 
circuit focused on reducing the number of qubits while still using only a 
polynomial number of elementary quantum gates. We also somewhat try to 
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minimize the depth of the circuit, but very little parallelization is available 
since we avoid using any unnecessary qubit. 
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Figure 2: The quantum addition as described by Draper 



2 The Circuit 

The circuit for factorization that will be discussed here was inspired in part 
by a circuit from Vedral, Barenco and Ekert 0. To reduce the number 
of qubits, we use a variant of a quantum addition algorithm described by 
Draper (fig. EJ). Other techniques used to reduce the number of qubits 
are the hardwiring of classical values and the sequential computation of the 
Fourier transform. 

The quantum addition of figure HI takes as input n qubits representing 
a number a, and n more qubits containing the quantum Fourier transform 
of an other number b, denoted by 4>(b). After the addition, the first register 
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keeps the same value a but the bottom register now contains the quantum 
Fourier transform of (a + 6)mod 2 n , denoted by <j){a + b). 




Figure 3: The circuit for addition of a classical value a to the quantum value b 
in the Fourier space. The gates Aj are classically computed combinations of phase 
shifts. 



2.1 The adder gate 

Adding together two quantum registers is, however, more than we ask for. 
We are trying to find the period of the function (a x )mod N where a is a 
classical random number smaller than N. Since a is classical, we only need 
to be able to add a classical value to a quantum register. We can thus change 
the qubits representing a in figure [2] to classical bits. The controlled gates 
are then classically controlled, and since we know what a is beforehand, we 
might as well precompute the product of all gates on each single qubit and 
apply only one gate for every single qubit. These are one-qubit gates, which 
also makes them easier to implement. 

Since the addition takes place in the Fourier space, we will call this circuit 
the 4>ADD(a) gate where a is the classical value added to the quantum 
register (fig. EJ). Notice the thick black bar on the right, used to distinguish 
the gate from its unitary inverse. In order to prevent overflow, we need n+1 
qubits for the quantum register instead of n, so that 4>{b) is effectively the 
QFT of an (n + l)-qubit register containing a n-bit number (thus the most 
significant qubit before the QFT preceding the addition is always |0)). 

If we apply the unitary inverse of the <j)ADD(a) gate with input </>(&), we 
get either cj)(b—a) if b > a, or 4>{2 n+1 — (a— b)) if b < a. Thus if b < a, the most 
significant qubit of the result is always |1), whereas it is always |0) if b < a. 
This reverse cpADD(a) gate can be useful for subtraction and comparison 
purposes (fig. 0J) and we use a black bar on the left to distinguish it from 
the regular gate. The unitary inverse of a circuit is obtained by applying 
the unitary inverse of each elementary gate in reverse order. 
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Figure 4: The effect of the reverse cj)ADD(a) gate on \(f>(b)). 



2.2 The modular adder gate 

Now that we have a 4>ADD{a) gate, we can use it to build a modular adder 
gate (fig. EJ). For future use, two control qubits are included in the circuit. 
For the modular adder gate, we need to compute a + b and subtract N 
if a + b > N. However, it is not so easy to implement this operation in 
a reversible way. The input to the (pADD(a)MOD(N) gate is 4>(b) with 
b < N, and the classical number a that we add is also smaller than N. 
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Figure 5: The doubly controlled 4>ADD(a)MOD(N) gate with a = c 2 = 1. If 
either of the control qubits is in state |0), the output of the gate is | </>(&)) since 
b < N. 



We begin by applying a 4>ADD(a) gate to the register 4>(b). The quantum 
register now contains (j){a + 6) with no overflow because we were careful 
enough to put an extra qubit in state |0) along with the value b before 
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applying the QFT. We next run a reverse (pADD(N) to get (j){a + b — N). 
If a + b < N, we did not have to subtract N but now we can determine if 
a+b < N by checking the most significant bit of a+b—N. However, to access 
this most significant bit we need to inverse the QFT on the whole register 
containing (j)(a + b — N). We can then use this qubit as the controlling qubit 
of a controlled- not gate acting on an ancillary qubit. It is then possible to 
reapply the QFT and use this ancilla as a control qubit for a <f>ADD(N) 
controlled gate, so that if a + b < N we add back the value N that we 
subtracted earlier. We now have <fi((a + 6)mod N) in the register, and we 
are done except for the ancilla which is now a junk bit. We have to restore 
it to |0) somehow, otherwise the computation will not be clean and the 
algorithm will not work a . 

Restoring the ancilla to |0) is no easy task if we do not want to waste 
qubits. We can still do it by using the identity: 

(a + 6)mod N > a + b < N. (1) 

Hence, we only have to compare (a + 6)mod N with the value a using 
essentially the same trick as before. We run an inverse 4>ADD{a) followed 
by an inverse QFT to get the most significant qubit of (a + &)mod N — a. 
This qubit is |0) if (a + b) mod N > a. We apply a NOT gate on this qubit 
and use it as the controlling qubit of a controlled-not gate targeting the 
ancilla. The ancilla is thus restored to |0) and we can apply a NOT gate 
again on the control wire, followed by a QFT and a <pADD(a) gate on the 
quantum register. After this, we have a clean computation of (a + 6)mod N 
in the Fourier space. 

Again, what we need exactly is a doubly controlled version of the 
<pADD{a)MOD{N) gate. In order to reduce the complexity of the cir- 
cuit, we will doubly control only the (f)ADD(a) gates instead of all the gates 
(fig. 03)- If the (j)ADD{a) gates are not performed, it is easy to verify that 
the rest of the circuit implements the identity on all qubits because b < N. 

2.3 The controlled multiplier gate 

The next step is to use the doubly controlled (j)ADD{a)MOD{N) gate to 
build a controlled multiplier gate that we will call CMULT{a)MOD(N) 
(fig.EJ). This gate takes three inputs, |c}|x}|6), and its output depends on the 
qubit |c). If |c) = |1), the output is |c)|x)|6+ (arc)mod N). If |c) = |0), then 

a Indeed, for the order-finding algorithm to work, we need to find the period of 
(a x )mod iV but the period of the garbage bits can be something else. 
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Figure 6: The CMULT(a)MOD(N) gate. 



the input is unchanged and stays |c)|x)|6). This gate is very straightforward 
to implement using doubly controlled (pADD(a)MOD(N) gates. We use 
the identity: 

(ax)mod N = 

(...((2°ax )mod N + 2 1 axi)mod N + ... + 2"- 1 ax„_i)mod N. (2) 

Thus we only need n successive doubly controlled modular adder gates, 
each of them adding a different value (2 J a)mod N with < i < n to get the 
CMU LT(a)MOD(N) gate. We now have a controlled gate that takes |x)|6) 
to \x)\b + (ax)mod N). What we would need instead is a controlled gate 
that takes \x) to |(aa;)mod N). This can however be obtained by a clever 
trick from reversible computing that uses two controlled multiplication gates 

(fig El- 

We first apply the CMULT(a)MOD{N) gate to |c)|x)|0). We follow 
with a SWAP between the two registers if the qubit |c) = |1) (that is ef- 
fectively a controlled-SWAP on the registers) 13 . We only need to control- 
SWAP n qubits, not n+1. Indeed, the most significant qubit of (ax)mod N 
will always be since we were careful to include one extra qubit to store 
the overflow in the <pADD(a) gate. We then finish with the inverse of a 
CMULT(a~ l )MOD(N) circuit. The value a" 1 , which is the inverse of a 

b We can do without the SWAP by modifying all later gates accordingly, but the SWAP 
simplifies the layout of the circuit without affecting the order of the complexity. 
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Figure 7: The controlled- U a gate. 



modulo N, is computable classically in polynomial time using Euclid's al- 
gorithm and it always exists since gcd(a,N) = 1. The fact that we apply 
the inverse of the circuit means that the circuit effectively takes |c)|x)|6) to 
|c)|ac)|(6 - a" 1 x)mod N). 

The resulting gate will be called C-U a for controlled-C/ a . It does nothing 
if \c) = |0) but if \c) = |1), then the two registers take the following values: 



\x)\0) — > |x)|(ax)mod N) — ► |(aa;)mod N)\x) — > 
|(ax)mod N)\(x - a~ 1 ax)mod N) = |(ax)mod N)\0). (3) 

Since the bottom register returns to |0) after the computation, we can 
consider this extra register as being part of the C-U a gate, thus the gate 
effectively takes \x) to \(ax)mod N). This is exactly the gate we need to 
run the quantum order- finding circuit (fig H|) . Of course, we don't need to 
apply C-U a n times to get (C-U a ) n because we can directly run C-U a n (where 
a n mod N in computed classically) which is the same as (C-U a ) n since: 



(a n x)mod N = (a...(a(ax)mod iV)mod iV...)mod N . (4) 



2.4 The one controlling-qubit trick 

An advantage of using the C-U 2 i gates for Shor's algorithm is the fact 
that we don't really need the total 2n controlling qubits. In fact, it can be 
shown that only one controlling qubit is sufficient This is possi- 

ble because the controlled-f/ gates all commute and the inverse QFT can 
be applied semi-classically. Indeed, we can get all the bits of the answer 
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Figure 8: The one control qubit trick for factoring. The R gates depend on all 
previous measurement results and implement the inverse QFT, while the X gates 
are negations conditionncd on the result of each measurement. 



sequentially as in figure |H1 Each measured bit dictates which unitary trans- 
formation we have to apply after every controlled-C/ step before the next 
measurement. This simulates the inverse QFT followed by a measurement 
on all qubits as in figure ^ We save an important number of qubits this 
way, and in fact we need only a total of 2n + 3 qubits to factor an n-bit 
number as we will show in the complexity analysis section. 




Figure 9: The exact quantum Fourier transform. H is the Hadamard gate. 



2.5 The quantum Fourier transform 

The implementation of the exact QFT on n qubits requires 0(n 2 ) opera- 
tions 3 (fig-inj)- However, in physical implementations, there will always be 
a threshold for the precision of the gates. Since many phase shifts will be 
almost negligible, we will in practice ignore the ones with k greater than a 
certain threshold k max . This approximate QFT is in fact very close to the 
exact QFT even with k max logarithmic in n. In fact, it has been shown jlUj 
that the error introduced by ignoring all gates with k > k max is proportional 
to n2~ kmax .We can thus choose k max € 0(lg(f )). 
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The implementation of the approximate QFT on n qubits requires 
0(nlg(n)) gates. There seems to be no obvious way to reduce the depth of 
either the exact QFT and the approximate QFT on n qubits below 0{n) 
without using extra qubits . The depth of the QFT onn + 1 qubits is 
thus 0{n) with the little parallelization available without extra qubits. 



2.6 The controlled-SWAP 




Figure 10: The controlled-SWAP gate. 



The controlled-SWAP on one qubit is very easy to implement (fig. HOj) , 
Only two controlled-not and one Toffoli are needed to perform the SWAP 
on two qubits controlled by a third. Thus, O(n) gates are needed to control- 
SWAP n qubits, that is, swap n qubits with n others with one control qubit. 



3 Complexity Analysis 

We now analyze the complexity of the given circuit for performing factor- 
ization of an n-bit number N. The analysis keeps track of the number of 
qubits, the order of the number of gates and the order of the depth of the 
circuit. For the depth of the circuit, we consider that it will be possible to 
apply simultaneously different quantum gates that act on different qubits 
of the quantum computer. However, we consider impossible to have one 
qubit controlling many operations in the same step. The circuit uses only 
single qubit gates, up to doubly controlled conditionnal phase shifts and up 
to doubly controlled not gates. These gates can be implemented using a 
constant number of single qubit gates and controlled-nots ; so they can 
all be considered as elementary quantum gates. 

The (j)ADD{a) circuit (fig. 01), where a is a classical value, requires n + 1 
qubits and 0{n) single qubit gates in constant depth. The number of qubits 
is n + 1 because we need an extra qubit to prevent overflows. When a 
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control qubit is added to the circuit, the depth becomes 0(n) since the 
conditional phase shifts have to be done sequentially. Indeed, the control 
qubit has to control each phase shift one at a time. The doubly con- 
trolled (j)ADD{a)MOD{N) circuit (fig. |SJ) requires n + 4 qubits. It also 
requires 0(nk max ) gates, but has a depth of only 0{n) regardless of k max be- 
cause the QFTs can be somewhat parallelized. The CMULT(a)MOD(N) 
circuit is only n doubly controlled <pADD(a)MOD(N) circuits. It thus 
takes 2n + 3 qubits, 0(n 2 k max ) gates and a depth of 0(n 2 ) to implement 
the CMULT(a)MOD(N) circuit. Two of these circuits along with the 
controlled-SWAP are needed for the C-U a circuit. The controlled-SWAP on 
n qubits requires only 0(n) gates and depth, so the C-U a circuit requires 
2n + 3 qubits, 0(n 2 k max ) gates and a depth of 0{n 2 ) again. 

For the whole order-finding circuit, that is, the whole quantum part of 
Shor's algorithm, we need 2n of these C-U a circuits. The quantum resources 
needed are thus 2n + 3 qubits, 0{ 

T^krnax) gates and a depth of O(tx^). If we 
decide to use the exact QFT in the additions, then we would have k max = n. 
As we argued earlier, this would not be clever because the implementation is 
sure to have hardware errors anyway. We thus should use the approximate 
QFT with k max = 0(lg(^)), so that the number of gates is in 0(ra 3 lg(n)) 
for any e polynomial in ~. 

This result of 2n + 3 qubits is slightly better than previous circuits for 
factorization. Vedral, Barenco and Ekert published a circuit of 7n + l qubits 
and 0(n 3 ) elementary gates for modular exponentiation |S]. It is mentionned 
that this number can be easily reduced to 5n + 2 qubits with basic optimiza- 
tion and further reduced to 4n + 3 if unbounded Toffoli gates (ra-controlled 
nots) are available. Beckman, Chan, Devabhaktoni and Preskill provided an 
extended analysis of modular exponentiation, with a circuit of 5n + 1 
qubits using elementary gates and 4n + 1 if unbounded Toffoli gates are 
available. Zalka also described a method for factorization with 3n + O(lgn) 
qubits using only elementary gates [H] . 

The availability of unbounded Toffoli gates will of course depend on 
the physical implementation of the quantum computer, but it is assumed 
throughout our design and analysis that such gates cannot be considered 
elementary. For that matter, if we do not restrict the type and size of the 
quantum gates in any way, order-finding can be achieved with n + 1 qubits 
by directly using controlled multiplication gates [5] . 

Of the 2n + 3 qubits used in the circuit provided here, one is used as 
an ancilla for modular addition, one is used to prevent addition overflows 
and n are used as an ancillary register to get modular multiplication from 
successive additions. An order-finding circuit using elementary gates and 
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less than 2n + O(l) qubits is not ruled out yet, but it seems that a different 
method would have to be used for modular multiplication to get such a 
circuit. 

Fifteen is the smallest number on which Shor's algorithm can be ap- 
plied. The circuit for factorization of N = 15 uses eleven qubits as given 
here. However, the classical computation performed to build it gives a lot of 
information on the order of the number a. Indeed, for any 1 < a < 15, the 
order of a is either two or four. Most of the multiplications in the circuit 
are simply the identity and can be removed, which amounts to many un- 
used qubits. The number 15 was factored using NMR with seven qubits in 
an impressive display of quantum control by Vandersypen, Steffen, Breyta, 
Yannoni, Sherwood and Chuang |14j . 

The importance of reducing the number of qubits versus reducing the 
depth of a quantum computation is not clear as quantum computers of useful 
size are not yet available. We have to keep in mind that error correction will 
most probably have to be used on quantum computers, which will create an 
overhead in the number of qubits used [3] . It is however sensible to minimize 
the number of qubits before applying error correction if qubits are hard to 
come by. 

4 Conclusion 

Putting together several tricks, we have developed a circuit for the quantum 
part of the factorization algorithm, that is, the order-finding algorithm, 
while focusing on reducing the number of qubits. The number of qubits 
needed is 2n + 3 and the depth is 0(n 3 ). This circuit uses slightly less 
qubits than those previously known if restricted to elementary gates. It is 
also completely general and does not rely on any properties of the number 
to be factored. 

Given the values a and N, this circuit gives the order r of a modulo 
N with good probability. Many runs of this algorithm may be needed to 
factor a number. Also, the randomly chosen value a is hardwired in the 
circuit and there is a probability (about one half) that it will be necessary 
to choose a new value a and run a new order-finding algorithm on it. This is 
not a problem if the quantum computer is a physical device where the gates 
are interactions controlled by a classical computer such as laser pulses on 
trapped ions, NMR and most implementation proposals. Indeed, the circuit 
can easily be classically computed. A quantum computer consisting of a 
physical system controlled by a classical computer is the most conceivable 
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option at this point. 
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